Structure-Aware Sampling: Flexible and Accurate Summarization

نویسندگان

  • Edith Cohen
  • Graham Cormode
  • Nick G. Duffield
چکیده

In processing large quantities of data, a fundamental problem isto obtain a summary which supports approximate query answer-ing. Random sampling yields flexible summaries which naturallysupport subset-sum queries with unbiased estimators and well-understood confidence bounds. Classic sample-based summaries,however, are designed for arbitrary subset queries and are oblivi-ous to the structure in the set of keys. The particular structure, suchas hierarchy, order, or product space (multi-dimensional), makesrange queries much more relevant for most analysis of the data.Dedicated summarization algorithms for range-sum queries havealso been extensively studied. They can outperform existing sam-pling schemes in terms of accuracy on range queries per summarysize. Their accuracy, however, rapidly degrades when, as is of-ten the case, the query spans multiple ranges. They are also lessflexible—being targeted for range sum queries alone—and are of-ten quite costly to build and use.In this paper we propose and evaluate variance optimal samplingschemes that are structure-aware. These summaries improve overthe accuracy of existing structure-oblivious sampling schemes onrange queries while retaining the benefits of sample-based sum-maries: flexible summaries, with high accuracy on both rangequeries and arbitrary subset queries.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Graph Hybrid Summarization

One solution to process and analysis of massive graphs is summarization. Generating a high quality summary is the main challenge of graph summarization. In the aims of generating a summary with a better quality for a given attributed graph, both structural and attribute similarities must be considered. There are two measures named density and entropy to evaluate the quality of structural and at...

متن کامل

Systematic literature review of fuzzy logic based text summarization

Information Overloadrq  is not a new term but with the massive development in technology which enables anytime, anywhere, easy and unlimited access; participation & publishing of information has consequently escalated its impact. Assisting userslq    informational searches with reduced reading surfing time by extracting and evaluating accurate, authentic & relevant information are the primary c...

متن کامل

Design of a novel congestion-aware communication mechanism for wireless NoC architecture in multicore systems

Hybrid Wireless Network-on-Chip (WNoC) architecture is emerged as a scalable communication structure to mitigate the deficits of traditional NOC architecture for the future Multi-core systems. The hybrid WNoC architecture provides energy efficient, high data rate and flexible communications for NoC architectures. In these architectures, each wireless router is shared by a set of processing core...

متن کامل

Application of Artificial Neural Networks for Analysis of Flexible Pavements under Static Loading of Standard Axle

In this study, an artificial neural network was developed in order to analyze flexible pavement structure and determine its critical responses under the influence of standard axle loading. In doing so, more than 10000 four-layered flexible pavement sections composed of asphalt concrete layer, base layer, subbase layer, and subgrade soil were analyzed under the impact of standard axle loading. P...

متن کامل

GJR-Copula-CVaR Model for Portfolio Optimization: Evidence for Emerging Stock Markets

Abstract T his paper empirically examines the impact of dependence structure between the assets on the portfolio optimization, composed of Tehran Stock Exchange Price Index and Borsa Istanbul 100 Index. In this regard, the method of the Copula family functions is proposed as powerful and flexible tool to determine the structure of dependence. Finally, the impact of the dep...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • PVLDB

دوره 4  شماره 

صفحات  -

تاریخ انتشار 2011